Difference of Convex Functions Programming for Reinforcement Learning
نویسندگان
چکیده
Large Markov Decision Processes are usually solved using Approximate Dynamic Programming methods such as Approximate Value Iteration or Approximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of the Optimal Bellman Residual (OBR) T ∗Q − Q, where T ∗ is the so-called optimal Bellman operator. Controlling this residual allows controlling the distance to the optimal action-value function, and we show that minimizing an empirical norm of the OBR is consistant in the Vapnik sense. Finally, we frame this optimization problem as a DC program. That allows envisioning using the large related literature on DC Programming to address the Reinforcement Leaning problem.
منابع مشابه
Difference of Convex Functions Programming Applied to Control with Expert Data
This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-reg...
متن کاملOptimality and Duality for an Efficient Solution of Multiobjective Nonlinear Fractional Programming Problem Involving Semilocally Convex Functions
In this paper, the problem under consideration is multiobjective non-linear fractional programming problem involving semilocally convex and related functions. We have discussed the interrelation between the solution sets involving properly efficient solutions of multiobjective fractional programming and corresponding scalar fractional programming problem. Necessary and sufficient optimality...
متن کاملOn Sequential Optimality Conditions without Constraint Qualifications for Nonlinear Programming with Nonsmooth Convex Objective Functions
Sequential optimality conditions provide adequate theoretical tools to justify stopping criteria for nonlinear programming solvers. Here, nonsmooth approximate gradient projection and complementary approximate Karush-Kuhn-Tucker conditions are presented. These sequential optimality conditions are satisfied by local minimizers of optimization problems independently of the fulfillment of constrai...
متن کاملConvex Generalized Semi-Infinite Programming Problems with Constraint Sets: Necessary Conditions
We consider generalized semi-infinite programming problems in which the index set of the inequality constraints depends on the decision vector and all emerging functions are assumed to be convex. Considering a lower level constraint qualification, we derive a formula for estimating the subdifferential of the value function. Finally, we establish the Fritz-John necessary optimality con...
متن کاملInequalities of Ando's Type for $n$-convex Functions
By utilizing different scalar equalities obtained via Hermite's interpolating polynomial, we will obtain lower and upper bounds for the difference in Ando's inequality and in the Edmundson-Lah-Ribariv c inequality for solidarities that hold for a class of $n$-convex functions. As an application, main results are applied to some operator means and relative operator entropy.
متن کامل